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@ Preparation of the human von Willebrand factor by recombinant DNA. 
@ Full length human von Willebrand factor (vWF) cDNA 
was assembled from partial, overlapping vWF cONAs. This 
cDNA construct includes an "open" reading frame of 2.813 
ammo-acid residues, apparently representing a (pre)-pro- 
sequence of 763 and mature vWF of 2.050 amino acids Both 
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.TITLE MODIFIE 

with such a cDNA ceding or fragments thereof respectively, cicro- 
organisms and animal or hucan cells which contain such plasmids or 
phages, preparation of proteins by the cultivation of the above 
5 mentioned hosts, the obtained proteins and pharmaceutical 
compositions containing a biologically active form of the obtained 
proteins • 

10 The von Willebrand factor (vWF) is a large, multimeric plasr.a 

protein, composed of an apparently, single glycoprotein with a 
molecular weight (MW) of about 225 kD. These subunits are linked 
together by disulfide bonds. In plasma vWF circulates as multimers, 
ranging from dimers to multimers of more than 50 subunits. Diners 
15 consist of two subunits joined, probably at their C-termini, by 
flexible "rod-shaped" domains and are presumed to be the protomers 
in multimerization. The protomers are linked through large, 
probably N-tenninal, globular domains to form multimers. 

vWF is synthesized by endothelial cells and megakaryocytes. It 
20 is believed that this protein is initially produced as a 260 kD 
glycosylated precursor that is subsequently subjected to carbo- 
hydrate processing, sulfatation, dimerization, multimerization and 
to proteolytic cleavage to yield the mature 225 kD subunit (Sporn, 
L.A. et al. (1985) Biosynthesis of vWF protein in human megakaryo- 
25 cytes, J. Clin. Invest. 76_, 1102-1106); Wagner, D.D. , et al. (1984) 
J. of Cell Biology 99_, 2123-2130). It has been shown that vWF is 
stored in the Weibel-Palade bodies within the endothelial cells. It 
can not be excluded that these organelles play a role in the 
processing of the precursor protein. 
30 vWF participates in critical steps in hemostasis. It is in- 

volved in platelet-vessel wall Interactions after vascular injury, 
leading to platelet plug formation. On the vWF protein domains have 
been assigned which show specific interaction with the platelet 
glycoproteins IB (Jenkins, C.S.P., et al. (1976) J. Clin. Invest. 
35 69_, 1212-1222), IIB/IIIA (Fujimoto, T., et al. (1982) Adenomie 
diphosphate induces binding of von Willebrand factor to human 
platelets, Nature 297_, 154-156), collagens type I and III and with 



another, yet unidentified, component in the subendo 

assignments are based on studies .ith monoclonal anti-vWF anti- 

For a in depth analysis of structure-function relationships of the 
vWF protein, a full length vWF cDNA will be indispensable. Intro- 
duction of vel"! -defined mutations within this cDNA and expression 
of the nutated cDNA in a suitable host will allow a detailed loca- 
lization of functional domains within the vWF protein. Recently, 
applicants and others (Lynch, D.C., et al. (1985) Cell 49-56; 
Ginsburg, M. , et al. (1985) J. Biol. Chem. 260 , 3931-3936; Verweij, 
C.L., et al. (1985) Nucleic Acids Research 13, 4699-4717, based on 
Dutch patent application 85.00961; Sadler, J.E., et al. (1985) 
Proc. Natl. Acad. Sci. USA 82 , 6394-6398) have cloned partial vWF 
cDNA sequences. The presence of a short 3' untranslated region (136 

that the precursor protein for vWF has a MW considerable larger 
than the reported 260 kD (Wagner, D.D., et al. (1984) J. of Cell 
Biology 99.. 2123-2130). A full length vWF cDNA will enable us to 
elucidate the enigma on the MW of the precursor, characterize its 
processing pathway and establish the primary structure. 

This invention relates to the isolation and the nucleotide se- 
quence of cDNAs, spanning the entire vWF mRNA, and on the assembly 
of these sequences into a full length., functional vWF cDNA. Further 
this invention relates to plasmides and phages containing above in- 
dicated vWF cDNAs as well as to the microorganisms like bacteria 
and fungi as well as animal or human cells containing said plasmids 
or phages. Moreover the proteins prepared by cultivating above 
indicated hosts and the pharmaceutical compositions containing a 
biologically active form of the obtained proteins do also fall 
within the scope of the present invention. In this respect it is 
emphasized that by means of such pharmaceutical compositions it is 
possible to circumvent the safety problems bound to the 
administration of vWF protein recovered from blood samples of 
normal individuals. 

To perform the investigations forming the basis of the 
invention, use has been made of the materials and method below. 



MATERIALS AND METHODS Xl 
cDNA cloning: 

cells, derivated froc veins of huian umbilical cords (Verweij, 
5 C.L., et al. ( 1985) Nucleic Acids Research 13_, 4699-4717, based on 
Dutch patent application 85.00961). Primer-directed cDNA was 
synthesized essentially according to a protocol described 
(Gubler, U., et al. (1983) Gene 25_, 263-269; Toole, J. J., et al. 
( 1985) Nature 312 , 342- 347). The cDNA synthesis was arrested by 
10 adding EDTA and SDS till a final concentration of, respectively, 20 

mM and 0.1 %. The cDNA preparations were extracted with phenol- 
chloroform, then precipitated with ethanol and purified by chro- 
matography on Sephadex G-50. In the case of primer-directed cDNA 
synthesis with primer A (5' CACAGGCCACACGTGGGAGC 3'), complementary 
15 to nucleotides 6901 till 6921 , the cDNA preparation was digested 
with Bglll (positions 6836 and 2141) and Kp (position 4748). Sub- 
sequently, the digested cDNA was size-fractionated by chromato- 
graphy on a Sepharose CL-4B column. Fractions containing cDNA 
larger than about 600 bp were ligated to plasmid pMBLll, digested 
20 with Bglll and KpnI. A cDNA library of about 15,000 independent 
colonies was established, using strain E.coli DH1 as a host, which 
was screened with two oligonucleotide probes (B and C). Probe B (5' 
GAGGCAGGATTTCCGGTGAC 3'), complementary to nucleotides 4819 till 
4839 was employed for the isolation of the plasmid pvWF2084 , 
25 harboring a 2,084 bp Bglll-Kpnl vWF cDNA fragment, while probe C 
(5' CAGGG ACAC CTTTCCAGGGC 3'), complementary to 2467 till 2487, was 
used for the detection of plasmid pvWF2600, harboring an approx- 
imately 2,600 bp KpnI-Bglll vWF cDNA fragment. 

Using probe C for primer-directed synthesis applicants divided 
30 the resulting cDNA preparation in two parts. One part was C-tailed 
and annealed to G-tailed plasmid pUC9 as described before (Verweij, 
C.L., et al. (1985) Nucleic Acids Research 13_, 4699-4717, based on 
Dutch patent application 85.00961) and used to transform E.coli 
DH1. Six thousand independent colonies were hybridized with a "nick 
35 translated" 575 bp Bglll-BamHI vWT cDNA fragment of pvWF2600 DNA. A 
positive clone, harboring a plasmid with the longest insert (about 
1,800 bp, designated pvWFl800) was chosen for furthc study. The 
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cNTPs to ensure blunt-ended termini (Maniatis, T., et al. (1982) 
Molecular Cloning Laboratory Manual, Cold Spring Harbor Laborato- 
5 ry). Phosphorylated EcoRI linkers (New England Biolabs, Beverly, 

MA) were 11 gated to the termini of the cDKA preparation and 
unreacted components were removed by Sephadex G-50 chromatography. 
The Xhol site, located about 350 bp downstream of the 5' end of the 
vWF cDNA insert of pvtfE1800 DNA, was used for another selection. 
10 After digestion with an excess of EcoRI and Xhol chromatography on 
Sepharose CL-4B was employed to remove digested EcoRI linkers. The 
final preparation was ligated to plasmid pMBLll DNA which had been 
digested with EcoR.1 plus Xhol and used to transform E.coii DHI - A 
collection of about 10,000 independent colonies was hybridized with 
15 a "nick translated" 350 bp Xhol-Hindlll vWF cDNA fragment from 

plasmid pvWFl800. The EIndlll site of this fragment has been 
derived from the polylinker of the vector pOC9 . A positive clone, 
harboring the longest Insert (about 1,330 bp, designated pvWF1330) 
was further studied. 
20 Si nuclease protection analysis: 

Applicants used a probe for SI nuclease protection experiments 
an XhoI-EcoRI fragment of about 5,300 bp (probe V) from plasmid 
P vWF2600 which contains a 2,390 bp segment (Xhol-Kpnl) constituted 
of vWF cDNA (Fragment V, Figure 1). Probe II was a 4,800 bp Xbal- 
25 EcoRI fragment from plasmid pvWFl330 which harbors a 735 bp Xbal- 
Xhol vWF cDNA segment (Fragment II, Figure 1). The fragments were 
3' end labelled, using DNA polymerase I (large fragment) (New 
England Biolabs, Beverly, MA) to fill In recessed ends (Maniatis, 
T., et al. (1982) Molecular Cloning Laboratory Manual, Cold Spring 
30 Harbor Laboratory). Subsequently, these probes were isolated by 
electrophoresis on a 0.7 Z low-melting agarose gel and purified as 
described (Wieslander, L. (1979) Anal. Blochem. 48_, 305-309). 

Three other vWF cDNA fragments were subcloned in double 
stranded M13mpl8 and employed as probes. To that end the anti-sense 
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piasmid pvUTl330 (Fragment I, Figure I) and a 575 bp EanHI-Bglll 
fragment of plasmid pv*T2 600 (Fragnent IV, Figure 1). After DNA 
synthesis, initiated at the Ml 3 primer, double stranded DNA was di- 
5 gested with both Hindi I I and PvuII for fragment III <" to yield probe 
II), with both Xbal and EcoRI for fragment I (to yield probe I) and 
with both EaraHI and PvuII for fragment IV (to yield probe IV). 
The rationale for the construction of probes II, III, IV and V is 
that they contain a segment of vector DNA non-complementary with 
10 endothelial RNA. For example probes III and IV harbor about 200 bp, 
derived from Ml3mpl8. These probes were subjected to electrophore- 
sis on a 5 X polyacrylamide - 8M urea gel and the fragments of 
interest were Isolated (Maxam A.G. , et al. (1980) Methods Enzymol. 
65_, 499-560). 

15 si nuclease protection experiments were carried out essential- 

ly as described (Berk, A.J., et al. (1977) Cell 12, 721-732). One 
microgram of human endothelial polyA RNA was added to 10,000 - 
100,000 counts per min. of radiolabeled probe, heated for 10 min. 
at 80°C, followed by an incubation overnight at 60°C for probes I, 
20 II, III and V and at 57 °C for probe IV. Digestion with 200 Units of 
SI nuclease (Bethesda Research Lab., Galthersburg, Md.) per ml for 
20 min. at 45°C. Undigested DNA was precipitated with ethanol and 
the pellets were dissolved in the appropriate loading buffer for 
electrophoresis on a 0.8 Z alkaline agarose gel or on a 6 Z poly- 
25 acrylamide sequencing gel (Maniatis, T., et al. (1982) Molecular 
Cloning Laboratory Manual, Cold Spring Harbor Laboratory). The 
first procdedure was employed for probes I and III, whereas the 
second one was applied for probes II, IV and V. 
Assembly of full length vWF cDNA: 
30 For the construction of plasmid P SP6330vWF, harboring a con- 

tinuous WTF cDNA segment of about 6,331 bp extending from the 
Hindlll site (position 2235) till the SacI site within the 3' un- 
translated region (position 8562) the following vWF cDNA fragments 
were isolated: the 2,517 bp Hindlll-Kpnl fragment (position 2236 
35 till 4753) from pvWF2600 DNA, the 2,084 bp KpnI-Bglll fragment 
(position 4753 till 6837) from pvWF2084 DNA and the 1,730 bp Bglll- 
-Sacl fragment (position 6837 till 8567) from P vOT2280 DNA. These 
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three vWF cDNA fragments were ligated simultaneously into the 

7035-7056), digested with both Hindlll and Sacl. About half of the 
resulting trans foraants contained a plasmid (denoted pSP633CvWF) 
5 with the desired vWF cDNA insert of about 6,331 bp, as verified by 
restriction enzyme analysis. 

For the construction of plasmid pSP8800vWF, harboring full 
length vWF cDNA, the following fragments were isolated: the 6,330 
bp Hindlll-EcoRI insert of plasmid P SP6330vWT (the EcoRI site is 

10 derived from the polylinker present on the vector), the 1,044 bp 
XhoI-HindHI fragment (position 1092 till 2236) from pvWFl800 and 
the 1,327 bp EcoRI-XhoI fragment (position - 236 till 1092) from 
pvWF1330. A five-fold molar excess of each of these three fragments 
were again ligated simultaneously with vector pSF65 DNA, cleaved 

15 with EcoRI and treated with calf intestine alkaline phosphatase 
(Boehringer, Mannheim, BRD). About 30 Z of the resulting colonies 
harbored a plasmid with the desired, full length vWF cDNA insert of 
about 8,795 bp, as verified by restriction enzyme analysis and 
nucleotide sequence analysis. 

20 In vitro transcription and translation: 

In vitro transcription of linear SP6-based DNA templates with 
SP6 RNA. polymerase (New England Nuclear, Dreieich, BRD) was per- 
formed in the presence of 0.1 mM UTP, CTP and ATP, 0.05 mM GTP and 
2 mM of m7G(5' )ppp(5' )G (Pharmacia, Uppsala, Sweden) to provide 

25 mRNA preparations with a capped terminus (Melton, D.A., et al. 

(1984) Nucl. Acids Research 1^, 7035-7056). In vitro translation of 
such 5' capped mRNAs was done in a rabbit reticulocyte lysate 
system (New England Nuclear, Dreieich, BRD), according to the 
manufacturer's specifications. 

30 Analysis of the In vitro translation products was performed by 

electrophoresis on a 8 Z SDS-polyacrylamide gel as described 
(Laemmli, U.K. (1970) Nature 227 , 650-655). 
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RESULTS 

Construction of partial vWF cDNA clones and assembly of full length 
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struction of plasmids containing part of a full length hucan vWF 
cPNA f.Vrvoi i, O.L., er a 1 . (19«5) Nucleic Acirf* ?«---»irch _T?» '• PQQ - 
-,717, based on Dutch patent application 85 .00961; Lynch, D.C., et 
al. (1985) Cell 4_1_, 49-56; Gir.sburg, M . , et al. ( 1985) J. Biol. 
5 Chen. 260, 3931-3936; Sadler, J.E., et al. (1985) Proc. Nucl. Acad. 
Sci. USA 82_, 6394-6398). The most extended vWF cDNA, that 
applicants obtained from an oligo(dT)-primed human endothelial cDKA 
library, comprised about 2,280 bp (pvWF2280). Nucleotide sequence 
analysis revealed that this cDNA insert has been initiated at the 
10 polyA tail of vWT mRNA. To construct a full length vWF cDNA, 
applicants have isolated additional, overlapping vWF cDNA sequences 
which are located upstream of pvWF2280. For that purpose two 
biochemical selections were employed to enrich for the number of 
vWF cDNA harboring plasmids. First, oligonucleotide primers, 
15 derived from the partial nucleotide sequence (Sadler, J.E., et al. 

(1985) Proc. Nucl. Acad. Sci. USA _82_, 6394-6398), were synthesized 
to direct cDNA synthesis with human endothelial polyA RNA as 
substrate. Second, cDNA preparations were digested with particular 
restriction endonucleases , known to dissect vWF cDNA at a limited 
20 number of sites (Ginsburg, M. , et al. (1985) J. Biol. Chem. 260 , 
3931-3936; Sadler, J.E., et al. (1985) Proc. Nucl. Acad. Sci. USA 
82, 6394-6398). These restriction sites can subsequently be 
employed to assemble a full length vWF cDNA. The cloning strategy 
is outlined in Figure 1A. The plasmids, containing adjacent vWF 
25 cDNA sequences, were designated pvWF1330, pvWF1800, pvWF2600, 
pvWF2084 and pvWF2280. The nucleotide sequence of the 5* part of 
pvWF1330 DNA (corresponding with the 5' end of vWF mRNA) revealed 
that nonsense codons were present in all three reading frames. From 
this finding applicants conclude that pvWT1330 DNA extends beyond 
30 the translation initiation codon. 

SI nuclease protection experiments with human endothelial RNA 
were performed to proof that the various vWF cDNA inserts are fully 
complementary to vWF mRNA. The construction of the probes and the 
conditions used are described in the section Materials and methods. 
35 The results are shown in Figure 2. In all cases the length of the 
protected fragments is in accord with the predicted length of the 
different probes. From these data applicants conclude that the WTF 
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„ re Gntzire ly c or.; 1 enen t a ry with vWF mRNA. The nucleotide sequence 
of the regaining cDNA inserts of plasties pvWF2084 and pvWF2280 
were show before to correspond with the publiBhed sequence 
(Sadler, J.E., et al. ( 1985) Proc. Nucl. Acad. Sci. USA 82_, 
6394-639S). Consequently, the different, adjacent vVTF cDNA 
sequences are genuine copies of v'WT xF.NA. 

The VWF cDNA sequences that applicants have constructed span a 
length of about 8,900 bp. This length is consistent with the size 
of vWF mFiNA, determined by Northern blot analysis of hu~an endo- 
thelial (polyA) RNA (Verweij, C.L., et al. (1985) Nucleic Acids Re- 
search 13 4699-4717, based on Dutch patent application 85-00961), 
A detailed description of the assembly of full length vWF cDNA is 
given in the section Materials and methods. The correct composition 
of the assembled, full length vwt cDNA was established by restrict- 
ion enzyme analysis. 

Nucleotide sequence of full length vWF cDNA: 

Nucleotide sequence analysis of vWF cDNA fragments was carried 
out both by the chemical degradation method (Maxam, A.M. et al., 
1977) and by the dideoxy chain-termination procedure (Sanger, F., 
et al. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467), according 
to the scheme outlined in Figure 1A. In Figure 3 the nucleotide 
sequence of about 8806 residues, extending from the 5* end of vWF 
mRNA, and the corresponding predicted amino acid sequence are 
presented. The remaining nucleotide sequence of the 3' part of vWF 
mRNA has been reported before (Sadler, J.E. et al. , (1985) Proc. 
Natl. Acad. Sci USA 82_, 6394-6398; Verweij, C.L. et al., (1985) 
Nucleic Acids Research 13, 4699-4717, based on Dutch patent 
application 85.00961). In general the overlapping part of our 
nucleotide sequence of vWF cDNA with that of Sadler, J.E. et al., 
(1985) Proc. Natl. Acad. Sci. USA 82_, 6394-6398 reveals no 
differences. However, the first 12 nucleotides (corresponding with 
position 2217 till 2229) at the 5' terminus of vWF cDNA on phage 
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whe . r „ as Sadler, J.E. el al., ( 1 985 ) Proc. Natl. Acad. Sci. i:SA_?2, 
C394-639S report an A residue, resulting in, respectively, a 
-roline and a histidine at that particular position. This 
difference night be due to a polymorphism in the vWF gene, although 
the proline residue has been established independently by automated 
amino acid sequence analysis of the mature vWF protein (Hessel, E. 
et al., (1984) Thrombosis Research 35.. 637-651). 

The total length spanned by the adjacent vWF cDNA sequences is 
8,814 bp, excluding the polyA tail of vWF tnRNA. The translation 
initiation site was assigned to the ATG codon indicated (position 1 
till 4), being the first initiator codon downstream of the TAG 
nonsense codon (position - 79 till - 82), which is followed by an 
"open" translation reading frame of 8,439 nucleotides (deduced from 
our sequence data and those of Sadler, J.E. et al. , (1985) Proc. 
Natl. Acad. Sci. USA 82_, 6394-6398). This assignment is supported 
by the observation that the predicted 22 N-terminal amino acid 
residues displays the characteristic features of a signal peptide. 
The cleavage site for a signal peptidase with the highest 
probability is located between the glycine and alanine residues, 
respectively, at position 22 and 23 (Von Heijne, (1983) Eur. J. 
Biochem. 133, 17-21). The proposed translation initiation codon is 
preceded by an untranslated region of at least 230 nt. 

A continuous vWF cDNA coding sequence of 8,439 bp potentially 
programs the synthesis of a polypeptide of 2,813 amino acid 
residues, with a calculated MW of 309 H>. To applicants' knowledge 
this represents the longest coding sequence determined to date. A 
computer aided search for (partial) homologous amino acid sequences 
with other proteins, contained within the N1E Protein Sequence Data 
Bank, did not reveal any major similarity with other proteins. 
Furthermore, it has been reported that mature WE protein is a 
glycoprotein, containing approximately 15 Z carbohydrate residues 
(Sodetz, J.M. et al., (1979) J. Biol. Chem. 254, 10754-10760). If, 
it is assumed that the carbohydrate moieties also contribute about 
15 % by weight to calculated molecular weight of pro-vWF then the 
molecular weight of pro-vWT will amount to approximately 350 kD. 
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Peculiarities of the arino acid sequence of Fre-vWF 

with the established N-termlnal acino acid sequence of mature vVF 
protein (Kessel, E. et al., (1984) Thrombosis Research 35, 637-651) 
confirms our earlier assumption (Verveij, C.L. et al., (1985) 
Nucleic Acids Research 1_3_, 4699-4717, based on Dutch patent appli- 
cation 85.00961) that the vWF precursor protein is considerably 
larger than the reported 260 kD glycoprotein (Wagner, D.D. et al., 
(1983) J. Biol. Chem. 258, 2065-2067). Alignment of the predicted 
amino acid sequence with the K— terminal sequence of the mature 
(225 kD) vW protein shows that the nucleotide sequence which codes 
for mature vWF protein initiates at position 229C . This conclusion 
implicates that the vWF precursor protein is 763 amino acid 
residues larger than the mature protein. Obviously, this pro- 
sequence (calculated MW 81 kD) is removed by protein processing to 
yield the mature vWF glycoprotein with a molecular weight of about 
225 kD. 

A homology matrix comparison of the amino acid sequence of the 
vWF precursor protein reveals a quadruplication of a domain ( Dl , 
D2, D3, D4) with a length of about 350 amino acid residues. Part of 
this domain (D* , about 96 amino acids) appears to be present at the 
N-terminus of mature vWF. Figure 4A shows the alignment of these 
repetitive domains. A salient feature of the pro-sequence is that 
it largely consists of a duplication of the D domain (Dl, D2). The 
repeats exhibit a significant conservation of the position of cys- 
teine residues, indicative for a structural similarity of the re- 
peats. Interestingly, the pro-sequence comprises an arginine- 
glycine-mspartic acid sequence ( "RGD" sequence) at position 698 
till 702. It has been' shown that a tetrapeptide with the indicated 
amino acid sequence can compete with proteins, harboring a similar 
sequence, which are involved in cell attachment (Plerschbacher , 
M.D. et al., (1984) Nature 309 , 30-33). It has been noticed that 
another RGD sequence Is present within the C-terminal part of the 
mature vWF protein (Sadler, J.E. et al. , (1985) Proc. Natl. Acad. 
Scl. USA 82, 6394-6398). 
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In vitro translation of vWF mRNA 

A full length vWF cDNA was assembled in order to demons t r a t ti- 
lts coding rapacity for a unglycosylated precursor protein with a 
of about 300 kD . Full length vWF cDNA was Inserted Into plasmid 
pSP65. This plasmid contains the S. typhimurium bacteriophage SP6 
proaotor which allows in vitro "run off" transcription of cloned 
DNA sequences, specifically directed by SP6 RNA polymerase (Melton, 
D.A. et al., (1984) Nucl. Acids Research 12, 7035-7056). Such mRNA 
preparations can be efficiently translated, using a reticulocyte 
lysate. 

Initially, plasmid pSP6330vWF (Figure IB) was constructed, 
harboring a continuous, 6,331 bp vWF cDNA sequence. This plasmid 
contains the entire coding sequence for mature vWF and in addition 
a sequence coding for 18 amino acid residues from the C-terminal 
part of the pro-sequence. Initiation of protein synthesis, directed 
by RNA transcribed from P SP6330vWF DNA, should occur at the methio- 
nine codon 8 amino acids downstream of the N-terminus of mature 
vWF. Translation of the in vitro synthesized vWF mRNA will then 
yield an (unglycosylated) polypeptide with a calculated MW of 220 
kD. Subsequently, P SP8800vWF (Figure IB) was constructed harboring 
the complete coding sequence for pre-vWF. This plasmid will encode 
a protein with a calculated MW of 309,250 D. The plasmids 
P SP6330vWF and pSP8800vWF were linearized with, respectively, EcoRI 
and Sail and transcribed in vitro. The - results of the in vitro 
translation of the various vWF mRNAs are given in Figure 5. The 
polypeptides encoded by P SP6330vWF DNA display a MW of about 200 
kD.. The discrepancy of this MW with the calculated MW (220 kD) is 
probably due to inaccuracy in the MW estimation of large proteins 
in --these gels. The complete coding sequence of P SP8800vWF DNA is 
translated into a polypeptide with a MW substantially larger than 
200 kD. To achieve a more accurate MW estimation for this extra- 
ordinarily long polypeptide, we produced partial, overlapping poly- 
peptides derived from selected portions of full length vWF cDNA. To 
that end P SP8800vWF DNA was digested with BamHI and the transcript 
(2855 nt long) was translated. It should be noted that 405 nt at 
the 3' end of this transcript constitute the 5' terminus of the 
transcript generated from P SP6330vWF cleaved with EcoRI . Hence, an 
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about 309 kD, after subtracting 
2 Drotein. derived froc the Earl 



protein region. The protein, ceriv 
template, has an estimated MV of about 100 kD , while as shown 
before template P SP6330vWF cleaved with EcoRI yields a product of 
about 200 kD. Enumeration of these MWs and subtracting the common 
region (15 kD) results in an estimated MW of 285 kD which is in 
accord with the calculated of 309,250 D. Furthermore, trans- 

lation of transcripts (1320 nt), derived from pSPS800vWF DNA 
digested with Xhol, reveals a polypeptide with a MW of 39 kD . This 
result is in agreement with the assignment of the translation 
initiation side at position 1 till 4. 

From these data applicants conclude that they have constructed 
a full length WF cDNA with a coding sequence of 8,439 bp which 
programs the synthesis of a precursor vWF protein consisting of 
2,813 amino acid residues. 

DISCUSSION 

This invention provides to the construction of a plasmid 
containing full length vWF cDNA. Nucleotide sequence analysis (this 
patent application; Verveij, C.L. et al., (1985) Nucleic Acids 
Research 13, 4699-4717, based on Dutch patent application 85.00961; 
Sadler, J.E. et al. , (1985) Proc. Natl. Acad. Sci- USA 82, 6394- 
6398) revealed that the length of the assembled vWF cDNA, excluding 
cDNA derived from the polyA tail, amounts to 8,805 bp. This result 
is in agreement with the length of vWF mRNA (about 9,000 nt) as 
determined by Northern blot analysis of endothelial (polyA) RNA 
(Verwei;*, C.L. et al., (1985) Nucleic Acids Research 13, 4699-4717, 
based on' Dutch patent application 85.00961). The entire coding 
sequence for a precursor vWF protein is 8,439 bp, corresponding to 
an unglycosylated polypeptide with a MW of about 309 kD. The 
translation initiation site for this protein could be assigned to 
the ATG codon at position 1 till 4 (see Figure 3). This assignment 
is in accord with the results of in vitro translation experiments 
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vy attributed to the precursor glycoprotein for mature v',F i..,s 
r \ por . ed ,o display a MW of 260 kD (Wagner, D.D. et ai., 
Biol. Chem. 258, 2065-2067). The discrepancy with the MW that 
applicants assign to the precursor protein m ay be due to inaccurac 
of MW estimations by SDS-polyacry lamide gelectrophoresis , inherent 
to large (glyco)proteins . _ 
encoding mature vWF initiates at 2,286 Dp 



The sequence 



of the translation initiation codon, as Inferred from 



alight of the established K-terminal aainc acid sequence of 
ma ture vWT (Hessel, B. et al . , (1984) Thrombosis Research 35_, 
637-651) with the predicted amino acid sequence (Figure 3). mis 
2 286 bp long P r epro-sequence was shown to be able to encode a 
polypeptide with a calculated MW of 83 kD. 

Consequently, the pro-vVF protein will be processed by a protease 
t0 yield two distinct polypeptides. It is conceivable that the 
specific cleavage between the arginine and serine residues at 
positions 763 and 764 occurs within the Weibel-Palade bodies o the 
endothelial cell, since In the presence of a protease inhibitor 
CPKSF) a complex can be detected within these organelles of matu 

f<1 . designated von Willebrand Antigen II 
vWT with another glycoprotein, designates v 

(W Agll) (Monkery, R«R« «- U~ T... 1978 • 
61 1498-1507). Several arguments can be advanced which indicate 
Tit it is likely that the pro-sequence of the precursor vWF pro- 
tein Is identical to vW Agll. 

Z the W of the ..glycosylated P- £ ,u en c, (83 kB) fit. with 
the reported W of the vW Agll glycoprotein of 98 kD. 
U) it ha. heeo de.on.trated that hoth rWP and « Agll are .yothe- 
ed by cnltnred eadotheliel tell, aod that the.e protein. are 
.dnnltaneen.ly released apoa .tWatlon with 1- de.^ . ^ 
irg inlne va.opres.ln (DDAVF) (MoCarroll, D.«. et al, 1984 Blood 63, 

rru'ha. heea ahovn asiag l^aaof luereaeeaea technic, that 
both protein, are mcaliaed la the periooclear region and U > £ 
Veibel-Palade bodle. (MoCarroll. et al. (1985), 

5 lave.t. 75, 1089-1095). 

lv > hot^WP aad W Agll are pre.eat ia platelet, aad released to 
gether after platelet activation. 
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v) the levels of vWF and vW Agll protein are linearly associated 
of a patient with severe von Villebrand's disease. 

vi) as mentioned before a cocplex between vWF and W Agll can be 
detected in the presence of a serine protease : .ibitor (PMSF) and 
not in the absence of the inhibitor. 

The predicted anino acid sequence of the pro-sequence displays a 
remarkable structure. It is composed of a duplicated segment of 
about 350 amino acid residues long. These two segments share 37 Z 
amino acid homology. Furthermore, they exhibit a considerable 
conservation of similarly located cysteine residues, indicating 
that structural features have been maintained within these direct 
repeats. Two copies of this repeat within the pro-sequence are also 
present within mature vWF , whereas part of this repeat is present 
at the K-terminus of mature vWF. Internal homologous regions have 
also been reported by Sadler, J.E. et al, Proc. Natl. Acad. Sci .USA, 
82, 6394-6398 (1985), two of which have been duplicated, while one 
is present in triplicate form (fig. 4B). These repeated sequences 
span a length of about 1070 amino acid residues within the mature 
vtfF protein. The repeated structures that applicants have found are 
independent of the ones reported by Sadler, J.E. et al. Proc. Natl. 
Acad. Sci .USA 82, 6394-6398 (1985). From these data applicants con- 
clude that about 90 Z of the precursor vWF protein is constituted 
of repetitive regions, Indicating that the precursor vWF gene has 
evolved from a series of duplicative events of at least four 
different regions. 

The pro-sequence may participate in the formation of large mil ti- 
mers, composed of vWF dimers (Wagner, D.D. and Marder, V.J. 1984, 
J. of Cell Biology 99_, 2123-2130; Lynch, D.E. et al (1983), The 
Journal of Biological Chemistry 258, 12757-12760). In this respect 
it may be relevant to note that pro^TWF is extremely rich in cyste- 
ines (8.1 Z) which are mainly located in the N- and O-terminal 
parts of the protein. The cysteine content of the pro-sequence Is 
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of vWF are required to form rod shape domains, connecting two sud- 
units. Multiner formation should occur by joining of these dimers, 

globular domains. Free suifhydryl groups of the cysteine residues 
in the pro-sequence are probably a pre-requi site for the formation 
of aultioers. 



The presence of a "RGD(C) " amino acid sequence within the pro- 
sequence may be indicative for another function of this protein. It 
has been shown that an RGD containing region on proteins, such as 
fibronectin and vitronectin carry out a crucial role in the inter- 
action with receptors on a cell surface ( Pierschbacher , M.D. and 
Ruoslahti, E. (1984) Nature 309, 30-33); Pytela.R. et al. Proc.Nat. 
Acad. Sci. USA 82_, 5766-5770). Those interactions are inhibited by 
RGD containing peptides. Interaction of mature vWF with activated 
platelets is also inhibited by RGD containing peptides, suggesting 
that this region on vWF is involved in platelet binding (Ginsburg, 
M., 1985 J.Biol.Chem. 260, 3931-3936); Haverstick, P.M. et al. 
1985, Blood). Based on the presence of an RGD sequence both in 
mature vWF and the pro-sequence, which may be equivalent to vW 
Agll, and on a striking homology and a structural conservation be- 
tween these two proteins, applicants assume that the pro-sequence 
might have a similar function as the mature vWF protein in a 
specific interaction with particular cell surface receptors. 



LEGENDS TO THE FIGURES 

Fig. 1 Strategy for the construction of vWF cDNAs, the assembly of 
full length vWF cDNA and the determination of the nucleotide se- 
quence. 

A) vWF mRNA is indicated by a bar; open area, signal peptide 
coding region; hatched area, pro-sequence coding region; solid 
area, mature vWF coding region. 

The oligonucleotides (20-mers) A (6901-6921). B (4819-4839) and C 
(2467-2487), which were used for primer-directed cDNA synthesis 
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and/ or as probe for hybridizations, are indicated by y-~ali bars. 
The 575 bp BGlII-BamHI and the 350 bp Hindlll-Xhol fragments which 
were used as probes for colony-screening are indicated by open 
bars. Below the scheniatic representation of vWT mRNA the five par- 
tial, adjacent vWF cDKAs are given which were used for the assembly 
of full length vVF cDNA AND FOR NUCLEOTIDE SEQUENCING; The frag- 
ments I, II, III, IV and V, which were used in the SI nuclease pro- 
tection experiments, are shown above the vWT cDNA Insert from which 
they were derived. The arrows indicate the nucleotide sequencing 
strategy. In the case of sequence analysis according to the proce- 
dure of Maxam and Gilbert (1977), the position of the radioactive 
labelling is given by a short vertical line at the end of an arrow. 
The slashes at the end of arrows mean that the end labelling was at 
a terminus, specified by vector DNA. Only restriction endonuclease 
sites which are relevant in this study are given. B, BamHI; Bg, 
Bglll; E, EcoRI; H, HIndlll; K, Kpnl; M, Met I; N, Narl; P, PvuII; 
S, Sail; Sc, Sad; X, Xbal; Xh, Xhol. 

B) Assembly of full length vWF cDNA. Plasmid P SP6330vWF contains 
a 6,331 bp vWF cDNA sequence, extending from the HIndlll site 
(position 2235) till the Sad site (position 8562), subcloned in 
vector P SP64. Plasmid P SP8800vWF includes full length vWF cDNA, ex- 
tending from the EcoRI site (see Panel A) till the Sad site 
(position 8562), subcloned In vector pSP65 . Restriction endo- 
nuclease sites, delimiting the fragments used for the assembly of 
full length vWF cDNA, are indicated with an asterix. The EcoRI site 
at the 5' end of full length vWF cDNA originates from the EcoRI 
linker, used for the construction of pvWF1330 DNA. The sites for 
restriction enzymes, which were employed to linearize plasmid DNAs 
for in vitro "run off" transcription by SP6 RNA polymerase, are 
indicated by a dot. The Sail site in plasmid P SP880OvWF and the 
EcoRI site in plasmid P SP6330vWF are present in the polylinkers of 
the pSP— type vectors. 



17 



01 97592 



rvbridized with [ 32P ] -labelled probes, cnnta.r.ng v»r t.,., 
quences. The construction of the different probes and tne con- 
ditions used are described in the section Experimental Procedures. 
The vWF cDNA segments, present in the probes, are shown in Fig. 1. 
Panel A shows the results after electrophoresis of the sample in a 
0.8 % alkaline agarose gel. Panel B gives the results after elec- 
trophoresis in a 6 Z polyacrylamide - 8 M urea gel. Lanes 1: hybri- 
dization with probe III, containing the 1,444 bp vWF cDNA fragment 
III (Fig. 1). Lanes 2: hybridization with probe V, containing the 
2,400 bp vWF cDNA fragment V (Fig. 1). Lanes 3: hybridization with 
probe I which is equivalent to the 585 bp vWF cDNA fragment I (Fig. 
1). Lanes 4: hybridization with probe IV, containing the 565 bp vWF 
cDNA fragment IV (Fig. 1). Lanes 5: hybridization with probe II, 
containing the 765 bp vWF cDNA fragment II (Fig. 1). Symbols: 
incubation of hybridized components In the absence of SI nuclease; 
+ incubation of the hybridized components in the presence of SI 
nuclease; c, incubation of the samples with SI nuclease after 
hybridization In the absence of endothelial polyA- RNA; M, single 
stranded DNA length markers. 

Fig.3 Nucleotide sequence of 8806 bp of vWF cDNA, derived from the 
5' terminus of vWF mRNA. The numbering starts at the putative ATG 
translation initiation codon. The predicted amino acid sequence is 
shown beneath the nucleotide sequence and are separately numbered, 
again starting at the putative methionine translation initiation 
codon. Potential N-linked glycosylate sites are underlined. The 
tripeptide arginine-glycine-aspartic acid is boxed. 

Fig.4 Internal homology within the precursor for vWF. 

A) Alignment of the amino acid sequences of the four repeated 
domains Dl , D2. D3. D4 and D' . The one-letter notation Is used and 
the amino acids are numbered as indicated in Fig. 3. Residues which 
are identical among the four or five repeats are boxed. 

B) Schematic representation of internal homologous regions within 
pro-vWF. indicated are the triplicated domain A (Al . A2 and A3) and 
two duplicated domains B (Bl and B2) and C (CI and C2), as reported 
by Sadler et al. (1985), and the quadruplicated domain D (Dl, D2 , 
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P3 and D4). The nu-eral position of these repeats are listed: Al 

(residues 1262 till 1480), A2 (1480-1673), A3 (1673-1875), SI 

(2296-2331), B2 (2375-2400), CI (2400-2516), C2 (2544-2663), Dl 

(34-387), D2 (387-746), D3 (866-1242), D4 (1947-2299) and D ' 
5 (769-866). 



Fig. 5 In vitro translation of vUF cRNA. 

Capped vWF oRNA was prepared in vitro, using "run off" transcrip- 
tion with SP6 RNA polymerase, as described in the section Experi- 

10 mental Procedures. The RNA preparations were added to a reticulo- 
cyte lysate translation system, containing [ 35S] -methionine , and 
polypeptides were synthesized for 90 min. The polypeptides were 
fracrinnsfpH on a 8 % SDS-polyacrylamide gel and then subjected to 
f luorography. M: MW marker proteins. E: endogeneously synthesized 

15 polypeptides (without added RNA). Lane 1: Polypeptides encoded by 
vWF mRKA transcribed from P SP6330vWF DNA, digested with EcoRI. Lane 
2: Polypeptides encoded by vWF ulRNA transcribed from pSP8800vWT 
DNA, digested with Sail. Lane 3: polypeptides encoded by vWF mRKA 
transcribed from pSP8800vWF DNA, digested with BamHI . Lane 4: poly- 

20 peptides encoded by vWF mRNA transcribed from P SP8800vWF DNA, di- 
gested with Xhol. 

A sample of the recombinant DNA plasmid P SP8800vWP in strain 
E.coli DH 1 was deposited at the "Centraalbureau voor Schimmel- 
25 cultures" in Baam, The Netherlands, under number CBS 163.86 on 
March 26, 1986. 
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1. cDNA fragment which can be introduced into a recombinant cDNA 

-ends at least partially to the gene which codes for the biological 
activity of the hunan von Willebrand factor. 

2. cDNA fragment according to claim 1, characterized in that the 
said cDKA fragment has the nucleotide sequence shown in Figure 3 or 
a part thereof. 

3. Recombinant cDNA plasmid or phage, characterized in that the 
cDNA fragment provided therein corresponds at least partially to 
the gene which codes for the von Willebrand factor. 

A. Recombinant cDNA plasmid or phage according to claim 3, 

characterized in that the cDNA fragment introduced has the 
nucleotide sequence shown in Figure 3 or a part thereof. 

5. Recombinant cDNA plasmid according to claims 3 or 4 , 
characterized in that the plasmid contains the vector P SP65 . 

6. Microorganism, animal cell or human cell containing a recombi- 
nant cDNA plasmid or phage, characterized in that the recombinant 
cDNA plasmid or phage is defined in any of the claims 3-5. 

7. Microorganism according to claim 6, characterized in that the 
microorganism is Escherichia coli. 

8. Microorganism according to claim 7, characterized in that the 
microorganism is the strain E. coli DH 1. 

9. Strain E.coli DH 1 containing the recombinant cDNA plasmid 
P SP8800vWF deposited at the Centraalbureau voor Schimmelcultures in 
Baarn, The Netherlands under C.B.S-number 163.86. 

10. Method for the preparation of proteins by the cultivation of a 
microorganism respectively animal or human cells containing a 
recombinant cDNA plasmid or phage, characterized by cultivating a 
host defined in any of the claims 6-9. 

11. (Glyco) proteins obtained by the method according to claim 10. 

12. vWF (glyco) protein having the amino acid sequence correspond- 
ing to the nucleotide sequence of 2518-8667 shown in Figure 3. 

13. (Glyco) protein having the amino acid sequence corresponding 
to the nucleotide sequence of 295-2517 shown in Figure 3. 

14. Pharmaceutical composition containing one or more biologically 
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haraaceutical composition containing the biologically activ 
) proteins according to claims 12 or 13. 
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Tin 



= M 
= STOP 



6AA AGGGAGGGTGGTTGGTGGATGTCACAGCTTGGGCTTTATCTCCCCCAGCAGTGGGAT 

91 . • 121 

TCCACAGCCCCTGGGCTACATAACAGCAAGACAGTCCGGAGCTGTAGCAGACCTGATTGA 



GCCTTTGCAGCAGCTGAGAGCATGGCCTAGGGTGGGCGGCACCATTGTCCAGCAGCTGAG 

211 . - 241 

TTTCCCAGGGACCTTGGAGATAGCCGCAGCCCTCATTTGCAGGGGAAGATGATTCCTGCC 

* I P A 

271 - I 301 

AGATTTGCCGGGGTGCTGCTTGCTCTGGCCCTCATTTTGCCAGGGACCCTTTGTbCAGAA 
RFAGVLLALAL I LPGTLCjAE 

331 - - 361 

GGAACT CGCGGCAGGTCATCCACGGCCCGATGCAGCCTT TTCGGAAGTGACTTCGTCAAC 
GTRGRSSTARCSLFGSDFVN 

391 • • 4 21 

ACCTTTGATGGGAGCATGTACAGCTTTGCGGGATACTGCAGTTACCTCCTGGCAGGGGGC 
TF DGS*YSFAGYCSYLLAGG 

451 . • 4B1 

TGCCAGAAACGCTCCTTCTCGATTATT6GGGACTTCCAGAATGGCAAGAGAGTGAGCCTC 
CQKRSFSI IGDFQNGKRVSL 

511 - - 541 

TCCGTGTATCTTGG6GAATTTTTTGACATCCATTTGTTTGTCAATGGTACCGTGACACAG 
SVYLGEFFDIHLFVNGTVTQ 

571 - 601 

GGGGACCAAAGAGTCTCCATGCCCTATGCCTCCAAAGGGCTGTATCTAGAAACTGAGGCT 
GDORVS«PYASKGLYLETEA 

~" . .631 - - 661 

GGGTACTACAA6CTGTCCGGTGAGGCCTATGGCTTTGTGGCCAGGATCGATGGCAGCGGC 
6YYKLSGEAYGFVAR IDBSG 

691 . - 721 

AACTTTCAAGTCCTGCTGTCAGACAGATACTTCAACAAGACCTGCGGGCTGTGTGGCAAC 
NFQVLLSDRYFNKTCGLCGN 

751 . - 781 

TTTAACATCTTTGCTGAAGATGACTTTATGACCCAAGAAGGGACCTTGACCTCGGACCCT 
FNIFAEDDF«TQE6TLTSDP 

Bll - - 841 

TATGACTTTGCCAACTCATGGGCTCTGAGCAGTGGAGAACAGT6GTGTGAACGGGCATCT 
YDFANSWALGSGEQWCERAS 

871 - - 9B1 

CCTCCCAGCAGCTCATGCAACATCTCCTCTGGGGAAATGCAGAABGGCCTGTGGGAGCAG 
PPSS8CNISSGE»QKGLWED 



0197592 



931 . . ^61 

TG~CAGCTTCTGhAGh5CACCTC6GT6TTTGCCCECTGCCACCCTCTGGTG3ACCCCGAG 
CQLLKSTSVFARCHPLVDPE 

991 . . 1021 

CCTTTTGTGGCCCTGT61 GAG A AG AC T " G TG T G AG T GTGC T GGGGG3C T G3 AG T GCGCC 
PFVALCEK TLCECAGGLECA 

1051 . . ieei 

TGCCCTGCCCTCCTGGAGTACGCCCGGACCTGTGCCCAGGAGGGAATGGTGCTGTACGGC 
CPALLEYARTCAQEG»VI_Y6 

1111 - - 11*1 

TGGACCG ACCACAGCGCGTGC AGCCC AGTGTGCCCTGCTG6T ATGGAGT AT AGGCAGTGT 
WTDHSACB. PVCPAG^E VRGC 

117 1 . . 120 1 

GTGTCCCCTT GCGCCAGG ACC TGCC AG AGCCTGC AC ATC AAT G A A A TG T GT C AGGAGCGA 
VSPCARTCQSLHINE*CQER 

1231 . - 1261 

TGCGTGGATGGCTGCAGCTGCCCTGAGGGACAGCTCCTGGATGAAGGCCTCTSCGTGGAG 
CVDGCSCPE6QLLDEGLCVE 

1291 . . 1321 

AGCACCGAGTGTCCCTGCGTGCATTCCGGAAAGCGCTACCCTCCCGGCACCTCCCTCTCT 
STECPCVHSGKRYPPGTSLS 

1351 . - 1381 

CGAGACTGCAACACCTGCATTTGCCGAAACAGCCAGTGGATCTGCAGCAATGAAGAATQT 
RDCNTC I CRNSQW I C5NEEC 

1411 . - 1^41 

CCAGGGGAGTGCCTTGTCACAGGTCAATCACACTTCAA5AGCTTTGACAACAGATACTTC 
P6ECLVTGQSHFKSFDNRYF 

1471 . - 1501 

ACCTTCAGTGGGATCTGCCAGTACCTGCTGGCCCGGGATTGCCAGGACCACTCCTTCTCC 
TFSG I CQYLLARDCQDHSFS 

1531 . - 1561 

ATTGTCATTGAGACTGTCCAGTGTGCTGATGACCGCGACGCTGTGTGCACCCGCTCCGTC 
I V ie.TVQCADDRDAVCTRSV 

1591 . . 1621 

ACCGTCCGGCTGCCTGGCCTGCACAACAGCCTTGTGAAACTGAAGCATGGGGCAGGAGTT 
TVRLPGLHNSLVKLKH6A6V 

1651 . - l^Bl 

GCCATGGATGGCCAGGACGTCCAGCTCCCCCTCCT6AAAGGTGACCTCCGCATCCAGCGT 
A«DGQDVQLPLLKGDLRIQR 

1711 . - 1741 

ACAGTGACGGCCTCCGTGCGCCTCAGCT ACGGGGAGGACCTGCAGATGGACTGGGATGGC 

tvtabvri svnFni o.nwnr 
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GGGAATTACAATGGCAACCAG5GCGACGACTTCCTTACCCCCTCTGGGCTGGCGGAGCCC 
GNYNGNQGDDFLTPSGLAEP 

1891 - • 1921 

CGGGTGG AGG ACT TCGGGA ACGCCTGGAAGC TGCACGGGG ACT GCC AGG ACCT6C AG A AG 
R VEDFGNAWKLHGDCQDLQK 

1951 - - 1991 

CAGCACAGCGATCCCTGCGCCCTCAACCCGCGCATGACCAGGTTCTCCGAGGAGGCGTGC 

QHSDPCALNPR*TRFSEEAU 

2011 • • 2t?41 

GCGGTCC TG ACG T CCCCC AC A TTCGAGGCCTGCC AT CGT GCC GTCAGCCCGCTGCCCT AC 
AVLTSPTFEACH.BAVSPLPY 

2071 - - 2101 

CTGCGGAACTGCCGCTACGACGTGTGCTCCTGCTCGGACGGCCGCGAGTGCCTGTGCGGC 
LRNCRYDVCSCSDGRECLCG 

2131 - • 2161 

GCCCTGGCCAGCTATGCCGCGGCCTGCGCGGGGAGAGGCGTGCGCGTCGCGTGGCGCGAG 
AUASYAAACAGRGVRVAWRE 

2191 - • 2221 

CCAGGCCGCTGTGAGCTGAACTGCCCGAAAGGCCA6GTGTACCTGC AGTGCGGGACCCCC 
PGRCELNCPKGQVYLQCGTP 

2251 - • 2231 

TGCAACCTGACCTGCCGCTCTCTCTCTTACCCGGATGAGGAATGCAATGAGGCCTGCCTG 
CNLTCRSLSYPDEECNEACU 

2311 • • 2341 

GAGGGCTGCTTCTGCCCCCCAGGGCTCTACATGGATGAGAGGGGGGACTGCGTGCCCAAG 
EGCFCPPGLY«DERGDCVPK 

2371 - - 2401 

GCCCAGTGCCCCTGTTACTATGACGGTGAGATCTTCCAGCCAGAAGACATCTTCTCAGAC 
AQCPCYYDGEIFOPEDIFSD 

2431 - - 2461 

CATCACACCATGTGCTACTGTGAGGATGGCTTCATGCACTGTACCATGAGTGGAGTCCCC 
HHT»CYCEDGF»HCT*SSvP 

2491 - • ?£ 21 

GGAAGCTTGCTGCCTGACGCTGTCCTCAGCAGTCCCCTGTCTCATCGCAGCAAAAGGhGC 
GSLLPDAVUSSPLBHRBK R B 

2551 • - 2581 

CTATCCTGTCGGCCCCCCAT6GTCAA6CTG6TGTGTCCCGCTGACAACCTGCGGGCTGAA 
LSCRPP'VKLVCPADNLRAE 



2611 - - 

GGGCTCGAGTGTACCAAAACGTGCCAGAACTATGACCTGGAGTGCATGAGCATGGGCTGT 
GLECTKTCQNYDLEC*S*BC 



2671 • - 2701 

6TCTCTGGCTGCCTCTGCCCCCCGGGCATGGTCCGGCATGAGAACAGATGTGTGGCCCTG 
VSGCLCPPB*VRHE N R C V « U 
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TOTAL * = M 



2731 . . 2761 

GAAAGGTGTCCCTGCTTCCATCAGGGCAAGGAGTATGCCCCTGGAGAAACAGTGAA^ATT 
ERCPCFHQGKEYAPGETVK I 

2791 . . 2821 

GGCTGCAACACTTGTGTCTGTCGGGACCGGAAGTGGAACTGCACAGACCATGTGTGTGAT 
GCNTCVCRDRKWNCTDHVCD 

285 1 . . 2BB1 

GCCACGTGCTCCACGATCGGCATGGCCCACTACCTCACCTTCGACGGGCTCAAATACCTG 
ATCST I G*AHYLTFDGLKYL 

291 1 . . 2941 

TTCCCCGGGGAGTGCCAGTACGTTCTGGTGCAGGATTACTGCGGCAGTAACCCTGGGACC 
FPGECQYVLVQDYCGSNPGT 

797 1 . . 3001 

r TTCGGATCCT AGTGGGGAATAAGGGATGCAGCCACCCCTCAGTGAAATGCAAGAAACGG 
FRILVGNKGCSHPSVKCKKR 

3031 . . 3061 

G TCACG A TCCTGGTGGAGGGAGGAGAGATT GAGC T G T T TG ACGGGG AGG TG AATGTGAAG 
VTILVE6GEIELFDGEVNVK 

3091 . . 3121 

AGGCCCATGAAGG ATGAGACTCACTTTGAGGTGGTGGAGTCTGGCCGGT ACATCATTCTG 
RP*KDETHFEVVESGRYI I L 

3151 - - 31B1 

CTGCTGGGCAAAGCCCTCTCCGTGGTCTGGGACCGCCACCTGAGCATCTCCGTGGTCCTG 
LLGKALSVVWDRHLSISVVL 

3211 . - 3241 

AAGCAGACATACCAGGAGAAAGTGTGTGGCCTGTGTGGGAATTTTGATGGCATCCAGAAC 
KQTYQEKVCGLCGNFDGIQN 

3271 . . 3301 

AATGACCTCACCAGCAGCAACCTCCAAGTGGAGGAGGACCCTGTGGACTTTGGGAAGTCC 
NDLTSSNLQVEEDPVDFGKS 

3331 . - 3361 

TG6GAAGTGAGCTCGCAGT6TGCTGACACCAGAAAAGTGCCTCTGGACTCATCCCCTGCC 
WEVSSQCADTRKVPLDSSPA 

3391 . - 3421 

ACCTGCCAT AACAACATCATGAAGCAGACGATGGTGGATTCCTCCTGT AGAATCCTT ACC 
TCHNN I »KQT*VDSECR I LT 

3451 . - 34B1 

AGTGACGTCTTCCAGGACTGCAACAAGCTGGTGGACCCCGAGCCATATCTGGATGTCTGC 
SDVFQDCNKLVDPEPYLDVC 



3511 . - 3541 

ATTTACGACACCTGCTCCTGTGAGTCCATTGGGGACTGCGCCTGCTTCTGCGACACCATT 
IYDTCBCESIGDCACFCDTI 
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L C . 3721 

3811 



E G C H 



E L 



rFAS gkkvtu 

3961 

CHCDVVNL.iv- 
uppTDArV 

4BB1 

TCCT 3C a 33C T 3,3C3 0 333 T 3 r ^^ r T3„«333 fi 3T TFT 3,33j33 S C^3 
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4531 . . 4561 

ATCCGCCTCATCGAGAAGCAGGCCCCTGAGAACAAGGCCTTCGTGCTGAGCAGTGTGGAT 
IRL IEKQAPENKAFVLSSVD 

4591 . . 4621 

GAGCTGGAGCAGCAAAGGGACGAGATCGTTAGCTACCTCTGTGACCTTGCCCCTGAAGCC 
ELECQRDE IVSYLCDLAPEA 

4651 . . 46B1 

CCTCCTCCTACTCTGCCCCCCG AC ATGGCAC AAGTCACTGTGGGCCCGGGGCTCTTGGGG 
PPPTLPPD»AQVTVGPGLLG 

4711 . . 4741 

GTTTCGACCCTGGGGCCCAAGAGGAACTCCATGGTTCTGGATGTGGCGTTCGTCCTGGAA 
VSTLGPKRNS»VLDVAFVLE 

4771 . . 4801 

GGATCGG ACAAAATTGGTGAAGCCG ACTTCAACAGGAGCAAGGAGTTCATGGAGG AGGTG 
GSDK I GEADFNRSKEF*EEV 

4831 . . 4861 

ATTCAGCGGATGGATG TGGGCC AG3AC AGCATCCACGTCACGGTGC TGCAGT ACTCCT AC 
IQR.»DVGQDSIHVTVLQYSY 

4891 . . 4921 

ATGGTGACCGTGGAGTACCCCTTCAGCGAGGC ACAGTCCAAAGGGGACATCCTGCAGCGG 
♦ VTVEYPFSEAQSKGDILQR 

4951 . . 4981 

GTGCGAGAGATCCGCTACCAGGGCGGCAACAGGACCAACACTGGGCTGGCCCTGCGGTAC 
VRE I RYQGGNRTNTGLALRY 

5011 . . 5041 

CTCTCTGACCACAGCTTCTTGGTCAGCCAGGGTGACCGGGAGCAGGCGCCCAACCTGGTC 
LSDHSFLVSQGDREQAPNLV 

5071 . . 5101 

TACATGGTCACCGGAAATCCTGCCTCTGATGAGATCAAGAGGCTGCCTGGAGACATCCAG 
Y*VTBNPASDE IKRLPGDIQ 

5131 . . 5161 

GTGGTGCCCATTGGAGTG6GCCCTAATGCCAACGTGCAGGAGCTGGAGAGGATTGGCTGG 
VVPIGVGPNAN VQELERIGW 

5191 . . 5221 

CCCAATGCCCCTATCCTCATCCAGGACTTTGAGACGCTCCCCCGAG AGGCTCCTG ACCTG 
PNAPILIQDFETLPREAPDL 

5251 . . S2B1 

GTGCTGCAGAGGTGCTGCTCCGG AG AGGGGCTGC AGATCCCCACCCTCTCCCCAGCACCT 
VLQRCCSGEGLQ IPTLSPAP 



5311 . . 5341 

GACTGCAGCCAGCCCCTGGACGTGATCCTTCTCCTGGATGGCTCCTCCAGTTTCCCAGCT 
DCSQPLDV I LLLDBBSSFPA 
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VWF TOTAL 



- = STOP 



PRLTOVSVL.U 

5521 

^^-rTrArrrTTSTG5ACETCAT6CAGCGG6AG 
TGG^CGTBGTCCCGGAGAAAGCCCATTTuCTGA.CCTTG^^ y , Q R E 

UNVVPEKAHV- 

55B1 
TGACTTCAGAA 
S E 



G GPSQIG DAL 

5641 

1 G 1 5821 

AATTCCTTCCTCCACflAACTeT6CTCTQGATTTGTTAeGATTTGCAT6SATGftSGATGGG 

N S F L » K L G G G F 
AATGAGAAGAGGCCCGGGGACGTCTGGACCTTGCCAGACCAGTGCCACACCGTGACTTGC 

N E K A F G D V V, T L 

CAGCCAGATGGCCAGACCTTGCTGAAGAGTCATC6GGTCAACTGTGACCG6GGGCTGAGG 

U . 6061 

cLcgtoccgtaagagcgagtggcg^^ 

P 6121 

acctgcccctgcgtgtgcacaggcagctccactcggcacatcgtgacctttgatgggcag 

T C p C « C T G 
AATTTCAAGCTGACTGGCAGCTGTTCTTATGTCCTATTTCAAAACAAGGAGCAGGACCT6 
N F K L T G S C G V V 

GAGGTGATTCTCCATAATGGTGCCTGCAGCCCTGGAGCAAGGCAGGGCTGCATGAAATCC 

E V 1 L . 63B1 

ATCGAGGTGAAGCACAGTGCCCTCTCCG^CGAGCTGCACAGTGACATGGAGGTGACGGTG 
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6331 . . 6361 

AATSG3AGACTGGTCTCTGTTCCTTACGTGGGTGGGAACATGGAAGTCAACGTTTAT5GT 
IMGRLVSVPYVGGN«EVNVYG 

6391 . . 6421 

GCCATCATGCATGAGGTCAGATTCAATCACCTTGGTCACATCTTCACATTCACTCCACAA 
AI*HEVRFNHLGHIFTFTPQ 

6451 . . 6481 

AACAATGAGTTCCAACTGCAGCTCAGCCCCA AGACTTTTGCTTC AAAGACGT ATGGTCTG 
NNEFQLQL5PKTFASKTYGL 

651 1 . . 6541 

TGTGGGATCTGTGATG AGAACGGAGCCAATGACTTC ATGCTGAGGGATGGCACAGTCACC 
CG I CDENGAND F *LRDGTVT 

6G7I . . 6601 

ACAGACTGGAAAACACTTGTTCAGGAATGG ACTGTGCAGCGGCCAGGACAGACGTGCCAG 
TDWKTLVQEWTVQRPGGTCQ 

6631 . . 6661 

CCCATCCTGGAGGAGCAGTGTCTTGTCCCCGACAGCTCCCACTGCCAGGTCCTCCTCTTA 
PILEEQCLVPDSSHCQVLLL 

6691 . . 6721 

CCACTGTTTGCTGAATGCCACAAGGTCCTGGCTCCAGCCACATTCTATGCCATCTGCCAG 
PLFAECHKVLAPATFYAICQ 

6751 . . 67B1 

CAGGACAGTTCGCACCAGGAGCAAGTGTGTGAGGTGATCGCCTCTT ATGCCCACCTCTGT 
QDSSHQEQVCEV I ASYAH-LC 

6911 . . 6841 

CGGACCAACGGGGTCTGCBTTGACTGGAGGACACCTGATTTCTGTGCTATGTCATGCCCA 
RTNGVCVDWRTPDFCA*SCP 

6871 . . 6901 

CCATCTCTGGTCTACAACCACTGTGAGCATGGCTGTCCCCGGCACTGTGATGGCAACGTG 
PSLVYNHCEHGCPRHCDGNV 

6931 . . 6961 

AGCTCCTGTGGGGACCATCCCTCCGAAGGCTGTTTCTGCCCTCCAGATAAAGTCATGTTG 
SSCGDHPSEGCFCPPDKV»L 

6991 . . 7021 

GAAGGCAGCTGTGTCCCTGAAGAGGCCTGCACTCAGTGCATTGGTGAGGATGGAGTCCAG 
EGSCVPEEACTCC I 6EDGV0 

7051 . . 7081 

CACCAGTTCCTGGAAGCCTGGGTCCCGGACCACCAGCCCTGTCAGATCTGCACATGCCTG 
HQFLEAWVPDHQPCGICTCL 



7111 . . 7141 

AGCGGGCGGAAGGTCAACTGCACAACGCAGCCCTGCCCCACGGCCAAAGCTCCCACGTGT 
SGRKVNCTTQPCPTAKAPTC 
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fin - □ 



= 5top 



^361 
3 , TTATBGT 
V Y 6 



AATGGGAGACTGGTCTCTGT TCCTTACGTG6GTBGGAACATGGAAGTCAACGT 
NG RLVSVPYVGGN*EVNV 

GCCATC AT GCATGAGGT C AG AT TCAATCACCTTGGTC AC ATCTTC AC ATT C ACT CC AC AA 
A I*HEVRFNHLGHIt- i 

6451 - • 6481 

AACAATGAGTTCCAACTGCAGCTCAGCCCCAAGACTTTTGCTTCAAAGACGTATGGTCTG 
MNEFQLQLSPKTFASKTYfL 

6511 • • 6541 

TGTGGGATCTGTGATG A6AACGGAGCCAATGACTTCATGCTGAGGGATGGC ACAGTCACC 
CGIC DENGANDF*LRDGTV 

■6571 • • 6&B1 

ACAGACTGGAAAACACTTGTTCAGGAATGGACTGTGCAGCGGCCAGG 

6631 ■ • 6661 

CCCATCCTGGAGGAGCAGTGTCTTGTCCCCGACAGCTCCCACTGCCAGGTCCTCCTCTTA 
PIL EEQCLVPDSSHCQVLl-L. 

6691 • • 6721 

CCACTGTTTGCTGAATGCCACAAGGTCCTGGCTCCAGCCACATTCTA^ 

6751 - ' 6731 

CAGGACAGTTCGCACCAGGAGCAAGTGT^ 

6811 • ' 6841 

^-«« N "rj CT ri TSO o CT f fl R GO " c p T " T " CT c T T s : eT r T c cc P A 

6871 - ' 6901 

6931 - • 6961 

ASCTCCTGT6GGGACCATCCCTCCGAAGGCTGTTTCTGCCCTCCAGATAAAGTCATSTTG 

BSCGDHPSEGCFCP^u^ 

6991 • • 7021 

GAAGGCAGCTGTGTCCCTGAAGAGGCCTGCACTCAGTGCATTGGTGAGGATGGAGTCCAG 
ESBCVPEEACTUCIBEUtivu 

7051 • - 7081 

CJCCAGTTCCTGGAAGCCT^ 

AGCGGGCGGAAGGTCAACTGCACAACGCAGCCCTGCCCCACGGC^ 

7171 . • 7201 

GGCCTBTGT6AAGTAGCCCGCCTCCGCCAGAATGCAGACCAGTGCTGCCCCGAGTATGAG 

GLCEVARLRQNfiUut-i- 



41 / i<* 
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VUIF TOTAL 



-* = M 



B131 . . B161 

AACCCC T GCCCCCTGGGTT AC AAGG A AGAAAAT A AC AC AGG TG A A TG TTGTGGG AGA TGT 
NPCPLGYKEENNTGECCGRC 

B191 . . B221 

JTGCCT ACGGCTTGC ACCATTCAGCT AAGAGGAGGACAGATCATG ACACTGAAGCGTGAT 
LPTACTI QLRGGC3 I * TLKRD 

B251 . - B281 

GAGACGCTCCAGGATGGCTGTGATACTCACTTCTGCAAGGTCAATGAGAGA3GAGAGTAC 
ETLQDGCDTHFCKVNERGEY 

G311 . . B341 

TTCTGGGAGAAGAGGGTCACAGGCTGCCCACCCTTTGATGAACACAAGTGTCTGGCTGAG 
FWEKRVTGCPPFDEHKCLAE 

B371 . . 8401 

GGAGGTAAAATTATGAAAATTCCAGGCACCTGCTGTGACACATGTGAGGAL TGAGTCC 
GGK I *KI PGTCCDTCEEPES 

B431 . . B461 

AACGACATCACTGCCAGGCTGCAGTATGTCAAGGTGGGAA6CTGTAAGTCTGAAGTAGAG 
NDI TARLQYVKVGSCKSEVE 

B491 . . 8521 

GTGGATATCCACTACTGCCAGGGCAAATGTGCCAGCAAAGCCATGTACTCCATTGACATC 
VDIHYCQGKCA5KA*YSIDI 

B551 . - B581 

AACGATGTGCAGGACCAGTGCTCCTGCTGCTCTCCGACACGGACGGAGCCCATGCAGGTG 
NDVQDQCSCCSPTRTEP« QV 

B611 . . B641 

GCCCTGCACTGCACCAATGGCTCTGTTGTGTACCATCAGGTTCTCAATGCCATGGAGTGC 
ALHCTNGSVVYHQVLNA«EC 

6671 . . B701 

AAATGCTCCCCCAGGAAGTCCAGCAAGTGAGGCTGCTGCAGCTGCATGGGTGCCTGCTGC 
KCSPRKSSK + 

8731 . • 8761 

TGCCTGCCTTGGCCTGATGGCCAGGCCAGAGTGCTGCCAGTCCTCTGCATGTTCTGCTCT 



8791 

TGTGCCCTTCTGAGCCCACAATAAAGGCTGAGCTCTTATCTT6CA 
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